Spoken Arabic Dialect Identification Using Phonotactic Modeling
نویسندگان
چکیده
The Arabic language is a collection of multiple variants, among which Modern Standard Arabic (MSA) has a special status as the formal written standard language of the media, culture and education across the Arab world. The other variants are informal spoken dialects that are the media of communication for daily life. Arabic dialects differ substantially from MSA and each other in terms of phonology, morphology, lexical choice and syntax. In this paper, we describe a system that automatically identifies the Arabic dialect (Gulf, Iraqi, Levantine, Egyptian and MSA) of a speaker given a sample of his/her speech. The phonotactic approach we use proves to be effective in identifying these dialects with considerable overall accuracy — 81.60% using 30s test utterances.
منابع مشابه
Using prosody and phonotactics in Arabic dialect identification
While Modern Standard Arabic is the formal spoken and written language of the Arab world, dialects are the major communication mode for everyday life; identifying a speaker’s dialect is thus critical to speech processing tasks such as automatic speech recognition, as well as speaker identification. We examine the role of prosodic features (intonation and rhythm) across four Arabic dialects: Gul...
متن کاملMulti-view Dimensionality Reduction for Dialect Identification of Arabic Broadcast Speech
In this work, we present a new Vector Space Model (VSM) of speech utterances for the task of spoken dialect identification. Generally, DID systems are built using two sets of features that are extracted from speech utterances; acoustic and phonetic. The acoustic and phonetic features are used to form vector representations of speech utterances in an attempt to encode information about the spoke...
متن کاملArabic Dialect Identification - 'Is the Secret in the Silence?' and Other Observations
Conversational telephone speech (CTS) collections of Arabic dialects distributed trough the Linguistic Data Consortium (LDC) provide an invaluable resource for the development of robust speech systems including speaker and speech recognition, translation, spoken dialogue modeling, and information summarization. They are frequently relied on also in language (LID) and dialect identification (DID...
متن کاملQMDIS: QCRI-MIT Advanced Dialect Identification System
As a continuation of our efforts towards tackling the problem of spoken Dialect Identification (DID) for Arabic languages, we present the QCRI-MIT Advanced Dialect Identification System (QMDIS). QMDIS is an automatic spoken DID system for Dialectal Arabic (DA). In this paper, we report a comprehensive study of the three main components used in the spoken DID task: phonotactic, lexical and acous...
متن کاملChinese dialect identification using an acoustic-phonotactic model
In this paper we develop hidden Markov model (HMM) based approaches to identify Chinese dialects spoken in Taiwan. This task can be aided by exploiting various characteristic features of Chinese spoken languages. The baseline system performs phonotactic analysis after the speech utterance is tokenized into a sequence of five broad phonetic classes. The sequential statistics of the resulting sym...
متن کامل